A Proposal on Evaluation Measures for RTE

نویسنده

  • Richard Bergmair
چکیده

We outline problems with the interpretation of accuracy in the presence of bias, arguing that the issue is a particularly pressing concern for RTE evaluation. Furthermore, we argue that average precision scores are unsuitable for RTE, and should not be reported. We advocate mutual information as a new evaluation measure that should be reported in addition to accuracy and confidence-weighted score.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Textual Entailment as an Evaluation Framework for Metaphor Resolution: A Proposal

We aim to address two complementary deficiencies in Natural Language Processing (NLP) research: (i) Despite the importance and prevalence of metaphor across many discourse genres, and metaphor’s many functions, applied NLP has mostly not addressed metaphor understanding. But, conversely, (ii) difficult issues in metaphor understanding have hindered large-scale application, extensive empirical e...

متن کامل

Development and Usability Evaluation of an Online Tutorial for “How to Write a Proposal” for Medical Sciences Students

Background and Objective: Considering the importance of learning how to write a proposal for students, this study was performed to develop an online tutorial for “How to write a Proposal” for students and to evaluate its usability. Methods: This study is a developmental research and tool design. “Gamified Online Tutorial based on Self-Determination Theory (GOT-STD) Framework" became the basis f...

متن کامل

The usefulness of transient elastography, acoustic-radiation-force impulse elastography, and real-time elastography for the evaluation of liver fibrosis

BACKGROUND/AIMS Several noninvasive methods have recently been developed for the evaluation of liver fibrosis. The accuracy of transient elastography (TE), acoustic-radiation-force impulse (ARFI) elastography, and real-time elastography (RTE) in predicting liver fibrosis were evaluated. METHODS Seventy-four patients who had undergone a liver biopsy within the previous 6 months were submitted ...

متن کامل

SPARTE, a Test Suite for Recognising Textual Entailment in Spanish

The aim of Recognising Textual Entailment (RTE) is to determine whether the meaning of a text entails the meaning of another text named hypothesis. RTE systems can be applied to validate the answers of Question Answering (QA) systems. Once the answer to a question is given by the QA system, a hypothesis is built turning the question plus the answer into an affirmative form. If the text (a given...

متن کامل

Análise de Medidas de Similaridade Semântica na Tarefa de Reconhecimento de Implicação Textual (Analysis of Semantic Similarity Measures in the Recognition of Textual Entailment Task)[In Portuguese]

In this work, we present a feature-based approach to the RTE (Recognizing Text Entailment) task that verifies the similarity between two sentences including syntactic and semantic aspects. The selected features come from the winning work of the RTE task of the workshop ASSIN (Semantic Similarity Evaluation and Textual Inference) with some changes and addition of other semantic feature. The eval...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009